| Dataset | Year | Programming Language | Data Source | Download Link |
|---|---|---|---|---|
| BigCloneBench | 2014 | Java | GitHub | Download |
| OJ dataset | 2016 | C++ | OJ Platform | Download |
| CodeSearchNet | 2019 | Go Java JavaScript PHP Python Ruby |
GitHub | Download |
| Code2Seq | 2019 | Java | GitHub | Download |
| Devign | 2019 | Java | GitHub | Download |
| Google Code Jam (GCJ) | 2020 | C++ Java |
OJ Platform | Download |
| CodeXGLUE | 2021 | Go Java JavaScript PHP Python Ruby |
GitHub | Download |
| CodeQA | 2021 | Java Python |
GitHub | Download |
| APPS | 2021 | Python | OJ Platform | Download |
| Shellcode_IA32 | 2021 | assembly language instruction | OJ Platform | Download |
| SecurityEval | 2022 | Python | GitHub | Download |
| LLMSecEval | 2023 | Python C |
GitHub | Download |
| PoisonPy | 2023 | Python | GitHub | not yet published |
| Attack Technique | Year | Venue | Attack Type | Target Models | Target Tasks |
|---|---|---|---|---|---|
| Remakrishnan et al. | 2020 | arXiv | Data poisoning | Code2Seq Seq2Seq |
Code summarization Method name prediction |
| Schuster et al. | 2021 | USENIX Security | Data poisoning Model poisoning |
Pythia GPT-2 |
Code completion |
| Severi et al. | 2021 | USENIX Security | Data poisoning | LightGBM EmberNN Random Forest Linear SVM |
Malware classification |
| CodePoisoner | 2022 | arXiv | Data poisoning | LSTM TextCNN Transformer CodeBERT |
Code defect detection Code clone detection Code repair |
| Wan et al. | 2022 | ESEC/FSE | Data poisoning | BiRNN Transformer CodeBERT |
Code search |
| BADCODE | 2023 | ACL | Data poisoning | CodeBERT CodeT5 |
Code search |
| Cotroneo et al. | 2023 | arXiv | Data poisoning | Seq2Seq CodeBERT CodeT5+ |
Code generation |
| AFRAIDOOR | 2023 | arXiv | Data poisoning | CodeBERT CodeT5 PLBART |
Code summarization |
| PELICAN | 2023 | USENIX Security | Data poisoning | BiRNN-func XDA-func XDA-cell StateFormer EKLAVYA EKLAVYA++ in-nomine in-nomine++ S2V, S2V++ Trex SAFE, SAFE++ S2V-B, S2V-B++ |
Binary code analysis |
| Li et al. | 2023 | ACL | Model poisoning | PLBART CodeT5 |
Code defect detection Code clone prediction Code2Code translation Text2Code translation Code refine |
| BadCS | 2023 | arXiv | Model Poisoning | BiRNN Transformer CodeBERT GraphCodeBERT |
Code Search |